Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91049: Introduce set vectorcall field API for PyFunctionObject #92257

Merged

Conversation

adphrost
Copy link
Contributor

@adphrost adphrost commented May 3, 2022

Avoid specializing functions with overridden vectorcall field

@adphrost adphrost requested a review from markshannon as a code owner May 3, 2022 19:33
@bedevere-bot
Copy link

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

@adphrost
Copy link
Contributor Author

adphrost commented May 3, 2022

@markshannon Can you let me know if there should be any other checks in ceval.c besides the one added in this PR?

For context, this check fixes an edge case where calling function with a set vectorfield only worked with a unpacked tuple, e.g. f(1) failed but f(*(1,)) succeeded (now both succeed)

Copy link
Contributor

@itamaro itamaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this @adphrost !

Lib/test/test_call.py Outdated Show resolved Hide resolved
Lib/test/test_call.py Outdated Show resolved Hide resolved
Python/ceval.c Outdated
@@ -4792,7 +4792,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
PyObject *function = PEEK(total_args + 1);
int positional_args = total_args - KWNAMES_LEN();
// Check if the call can be inlined or not
if (Py_TYPE(function) == &PyFunction_Type && tstate->interp->eval_frame == NULL) {
if (Py_TYPE(function) == &PyFunction_Type && tstate->interp->eval_frame == NULL && ((PyFunctionObject *)function)->vectorcall == _PyFunction_Vectorcall) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably exceeds line length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the max line length allowed? And is there a linter I can run to handle this for me?


This PR introduces an API call where vectorcall field can be modified like so:

`void PyFunction_SetVectorcall(PyFunctionObject *func, vectorcallfunc vectorcall);`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the function in Doc/c-api/function.rst and in the NEWS entry, refer to use with:

:c:func:`PyFunction_SetVectorcall`

New public functions should also be documented in What's New in Python 3.11 > C API > New features: Doc/whatsnew/3.11.rst.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.
Have you attempted to measure the performance impact of this? (I'd expect it to be small or negligible).

Objects/funcobject.c Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Change how assertion is done

Co-authored-by: Itamar Ostricher <itamarost@gmail.com>
@cpython-cla-bot
Copy link

cpython-cla-bot bot commented May 3, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@itamaro
Copy link
Contributor

itamaro commented May 4, 2022

Have you attempted to measure the performance impact of this? (I'd expect it to be small or negligible).

@markshannon is there a simple way to ask a buildbot to run pyperformance comparing this against the base revision? we can do that manually, but a buildbot doing it for us would be nice :)

@itamaro
Copy link
Contributor

itamaro commented May 19, 2022

circling back to @markshannon 's performance impact question. -- tldr Geometric mean: 1.00x faster

some more details:

rebased this PR onto a4460f2 and ran pyperformance on a4460f2 and on the rebased branch

both baseline and branch were built using ./configure --enable-optimizations --with-lto && make -j

used pyperformance 1.0.5 (installed directly from GitHub) on bare metal AWS machine (Linux-5.13.0-1023-aws-x86_64-with-glibc2.31 with 72 logical CPUs), and excluded the sqlalchemy benchmarks because they tried building greenlet and failed

pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-opt/python -o main.json --rigorous
pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-gh-91049-vectorcall-opt/python -o gh-91049.json --rigorous

compare results:

pyperf compare_to main.json gh-91049.json -G
Slower (20):
- regex_effbot: 3.02 ms +- 0.01 ms -> 3.17 ms +- 0.01 ms: 1.05x slower
- meteor_contest: 110 ms +- 2 ms -> 114 ms +- 4 ms: 1.04x slower
- pidigits: 200 ms +- 0 ms -> 208 ms +- 0 ms: 1.04x slower
- telco: 7.17 ms +- 0.21 ms -> 7.39 ms +- 0.34 ms: 1.03x slower
- html5lib: 64.0 ms +- 2.7 ms -> 65.6 ms +- 2.6 ms: 1.02x slower
- unpickle_list: 5.30 us +- 0.08 us -> 5.40 us +- 0.05 us: 1.02x slower
- pyflate: 430 ms +- 3 ms -> 438 ms +- 4 ms: 1.02x slower
- sympy_sum: 168 ms +- 2 ms -> 171 ms +- 2 ms: 1.02x slower
- xml_etree_iterparse: 110 ms +- 1 ms -> 112 ms +- 3 ms: 1.02x slower
- async_tree_memoization: 640 ms +- 25 ms -> 650 ms +- 25 ms: 1.01x slower
- chaos: 71.8 ms +- 0.7 ms -> 72.8 ms +- 0.6 ms: 1.01x slower
- hexiom: 6.71 ms +- 0.06 ms -> 6.80 ms +- 0.06 ms: 1.01x slower
- sympy_str: 299 ms +- 3 ms -> 302 ms +- 4 ms: 1.01x slower
- spectral_norm: 101 ms +- 1 ms -> 103 ms +- 1 ms: 1.01x slower
- sympy_integrate: 21.5 ms +- 0.2 ms -> 21.8 ms +- 0.2 ms: 1.01x slower
- logging_silent: 105 ns +- 1 ns -> 107 ns +- 3 ns: 1.01x slower
- raytrace: 303 ms +- 2 ms -> 306 ms +- 4 ms: 1.01x slower
- regex_v8: 23.1 ms +- 0.3 ms -> 23.3 ms +- 0.1 ms: 1.01x slower
- unpickle_pure_python: 239 us +- 2 us -> 240 us +- 2 us: 1.01x slower
- 2to3: 282 ms +- 2 ms -> 283 ms +- 1 ms: 1.00x slower

Faster (25):
- genshi_xml: 57.1 ms +- 2.7 ms -> 54.3 ms +- 1.8 ms: 1.05x faster
- scimark_sparse_mat_mult: 4.93 ms +- 0.06 ms -> 4.70 ms +- 0.10 ms: 1.05x faster
- scimark_lu: 112 ms +- 2 ms -> 109 ms +- 2 ms: 1.03x faster
- json_dumps: 13.3 ms +- 0.2 ms -> 12.9 ms +- 0.1 ms: 1.03x faster
- scimark_fft: 337 ms +- 3 ms -> 331 ms +- 3 ms: 1.02x faster
- nbody: 97.6 ms +- 1.8 ms -> 95.9 ms +- 1.8 ms: 1.02x faster
- pathlib: 23.0 ms +- 1.2 ms -> 22.6 ms +- 0.8 ms: 1.02x faster
- pickle_pure_python: 324 us +- 3 us -> 318 us +- 3 us: 1.02x faster
- genshi_text: 23.2 ms +- 0.4 ms -> 22.8 ms +- 0.3 ms: 1.02x faster
- unpack_sequence: 46.5 ns +- 0.6 ns -> 45.7 ns +- 0.7 ns: 1.02x faster
- fannkuch: 401 ms +- 6 ms -> 396 ms +- 4 ms: 1.01x faster
- chameleon: 7.06 ms +- 0.06 ms -> 6.98 ms +- 0.08 ms: 1.01x faster
- pickle_dict: 27.5 us +- 0.1 us -> 27.2 us +- 1.0 us: 1.01x faster
- pickle: 10.2 us +- 0.1 us -> 10.1 us +- 0.1 us: 1.01x faster
- xml_etree_parse: 170 ms +- 2 ms -> 168 ms +- 3 ms: 1.01x faster
- xml_etree_process: 56.8 ms +- 0.5 ms -> 56.3 ms +- 0.5 ms: 1.01x faster
- nqueens: 88.0 ms +- 1.1 ms -> 87.2 ms +- 1.1 ms: 1.01x faster
- json_loads: 26.3 us +- 0.3 us -> 26.1 us +- 0.2 us: 1.01x faster
- async_tree_none: 549 ms +- 15 ms -> 544 ms +- 12 ms: 1.01x faster
- async_tree_cpu_io_mixed: 771 ms +- 13 ms -> 766 ms +- 10 ms: 1.01x faster
- mako: 10.5 ms +- 0.0 ms -> 10.4 ms +- 0.1 ms: 1.01x faster
- crypto_pyaes: 77.2 ms +- 0.8 ms -> 76.8 ms +- 0.9 ms: 1.01x faster
- xml_etree_generate: 81.2 ms +- 0.6 ms -> 80.8 ms +- 0.5 ms: 1.00x faster
- regex_compile: 142 ms +- 1 ms -> 141 ms +- 1 ms: 1.00x faster
- scimark_sor: 121 ms +- 1 ms -> 120 ms +- 1 ms: 1.00x faster

Benchmark hidden because not significant (18): async_tree_io, deltablue, django_template, dulwich_log, float, go, logging_format, logging_simple, pickle_list, python_startup, python_startup_no_site, regex_dna, richards, scimark_monte_carlo, sqlite_synth, sympy_expand, tornado_http, unpickle

Geometric mean: 1.00x faster

thanks @ansley for doing the perf analysis on this :)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. This change reminds me my old https://peps.python.org/pep-0510/ nice :-)

Doc/c-api/function.rst Outdated Show resolved Hide resolved
static PyObject *
override_vectorcall(
PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwnames) {
Py_RETURN_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you modify the test to return the string "overridden" instead of just returning None?

@@ -236,6 +236,10 @@ New Features
an additional metaclass argument.
(Contributed by Wenzel Jakob in :gh:`93012`.)

* Add new function :c:func:`PyFunction_SetVectorcall` to the C API
which sets the vectorcall field of a given PyFunctionObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may explain the purpose of the function here, like mention Cinder JIT and Pyjion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have a "purpose". We should state clearly what the function does, not bless some particular use case.

@adphrost
Could you add a warning that the new function pointer must have exactly the same behavior as the unaltered function.
I am a little concerned that C extensions will abuse this, and we will have deal with the bug reports.

CPython extensions providing optimized execution of Python bytecode (like Cinder JIT and Pyjion)
can benefit from being able to modify the vectorcall field on instances of PyFunctionObject to allow calling the optimized path (e.g. JIT-compiled) directly.

This PR introduces an API call where vectorcall field can be modified like so:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text lands into https://docs.python.org/dev/whatsnew/changelog.html#changelog which is a changlog, you should not mention "a PR" there.

I suggest to just copy/paste what you wrote in whatsnew.

@@ -236,6 +236,10 @@ New Features
an additional metaclass argument.
(Contributed by Wenzel Jakob in :gh:`93012`.)

* Add new function :c:func:`PyFunction_SetVectorcall` to the C API
which sets the vectorcall field of a given PyFunctionObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
which sets the vectorcall field of a given PyFunctionObject
which sets the vectorcall field of a given :c:type:`PyFunctionObject`.

@itamaro
Copy link
Contributor

itamaro commented Jun 25, 2022

@vstinner @markshannon I think all comments were addressed - any further feedback?

@@ -83,6 +83,12 @@ There are a few functions specific to Python functions.
Raises :exc:`SystemError` and returns ``-1`` on failure.


.. c:function:: void PyFunction_SetVectorcall(PyFunctionObject *func, vectorcallfunc vectorcall)

Set the vectorcall field of a given function object *func*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set the vectorcall field of a given function object *func*
Set the vectorcall field of a given function object *func*.

@@ -236,6 +236,12 @@ New Features
an additional metaclass argument.
(Contributed by Wenzel Jakob in :gh:`93012`.)

* Add new function :c:func:`PyFunction_SetVectorcall` to the C API
which sets the vectorcall field of a given :c:type:`PyFunctionObject`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
which sets the vectorcall field of a given :c:type:`PyFunctionObject`
which sets the vectorcall field of a given :c:type:`PyFunctionObject`.

* Add new function :c:func:`PyFunction_SetVectorcall` to the C API
which sets the vectorcall field of a given :c:type:`PyFunctionObject`
Warning: extensions using this API must preserve the behavior
of the unaltered function!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this document is the right place to put a warning. Why this warning is not in the documentation of the function?

which sets the vectorcall field of a given :c:type:`PyFunctionObject`

Warning: extensions using this API must preserve the behavior
of the unaltered function!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this warning.

@@ -0,0 +1,5 @@
Add new function :c:func:`PyFunction_SetVectorcall` to the C API
which sets the vectorcall field of a given :c:type:`PyFunctionObject`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
which sets the vectorcall field of a given :c:type:`PyFunctionObject`
which sets the vectorcall field of a given :c:type:`PyFunctionObject`.

Python/ceval.c Outdated Show resolved Hide resolved
@itamaro
Copy link
Contributor

itamaro commented Jul 27, 2022

@vstinner @markshannon thanks for the feedback. with @markshannon 's last response I think the remaining feedback is minor styling and phrasing. if that's all that left for this to be merged, I'm happy to perform these changes, but also totally ok with a core dev making these last changes when merging 😄

@itamaro
Copy link
Contributor

itamaro commented Sep 6, 2022

pushed a fix to LOAD_ATTR_GETATTRIBUTE_OVERRIDDEN to deopt on function version instead of argcount

@markshannon I also reviewed the interpreter loop and didn't found any other cases that inline calls without deopting on function version. the methodology I used is searching for occurrences of inlined_py_calls - let me know if this should cover everything!

(btw, I noticed that BINARY_SUBSCR_GETITEM contains CALL_STAT_INC(inlined_py_calls) twice - is that intentional?)

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts touching the specializer and specialized opcodes LGTM.

@Fidget-Spinner
Copy link
Member

@itamaro

(btw, I noticed that BINARY_SUBSCR_GETITEM contains CALL_STAT_INC(inlined_py_calls) twice - is that intentional?)

No that's wrong.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor formatting nit. Otherwise looks good.

Python/ceval.c Outdated Show resolved Hide resolved
@itamaro
Copy link
Contributor

itamaro commented Sep 7, 2022

(btw, I noticed that BINARY_SUBSCR_GETITEM contains CALL_STAT_INC(inlined_py_calls) twice - is that intentional?)

No that's wrong.

I'll file a separate issue for that

@markshannon
Copy link
Member

markshannon commented Sep 7, 2022

All looks good to me.
@vstinner anything else before I merge this?

Modules/_testcapi/vectorcall.c Outdated Show resolved Hide resolved
Modules/_testcapi/vectorcall.c Outdated Show resolved Hide resolved
Objects/funcobject.c Show resolved Hide resolved
Python/ceval.c Show resolved Hide resolved
Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test LGTM in general but I have a few comments. Thanks!

Lib/test/test_call.py Outdated Show resolved Hide resolved
Lib/test/test_call.py Outdated Show resolved Hide resolved
Lib/test/test_call.py Outdated Show resolved Hide resolved
1. test specialization doesn't trigger when vectorcall is already overridden
2. test specialization gets deopted when vectorcall is overridden after specialization triggered
@gvanrossum
Copy link
Member

Almost ready. Who's on the hook to get this merged?

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@markshannon
Copy link
Member

I'll merge once the conflict has been resolved.

@itamaro
Copy link
Contributor

itamaro commented Sep 15, 2022

I'll merge once the conflict has been resolved.

resolved!

@markshannon markshannon merged commit a41ed97 into python:main Sep 15, 2022
@vstinner
Copy link
Member

PyFunction_SetVectorcall() is really a cool feature, I love it! I'm not sure yet how people will manage to abuse it, but I love it :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants